Skip to content

B200 Minimax FP8 vllm upgrade#947

Merged
cquil11 merged 22 commits intomainfrom
nv/minimax-vllm018
Apr 3, 2026
Merged

B200 Minimax FP8 vllm upgrade#947
cquil11 merged 22 commits intomainfrom
nv/minimax-vllm018

Conversation

@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator

@kedarpotdar-nv kedarpotdar-nv commented Mar 26, 2026

Summary

Upgrade MiniMax-M2.5 FP8 B200 vLLM benchmark configuration from v0.17.0 to v0.19.0 with expanded search space and tuned serving parameters.

Changes

Image Upgrade

  • Update vLLM image from v0.17.0-cu130 to v0.19.0-cu130

Search Space Updates (nvidia-master.yaml)

  • Expand concurrency ranges for existing tp:2 and tp:4 entries (up to 512 for 1k1k, up to 256 for 8k1k)
  • Add tp:2 ep:2 and tp:4 ep:4 search-space entries for 1k1k seq-len config
  • Remove ISL 1024 / OSL 8192 seq-len config

Benchmark Script Updates (minimaxm2.5_fp8_b200.sh)

  • Remove VLLM_USE_FLASHINFER_MOE_FP8=0 and VLLM_MOE_USE_DEEP_GEMM=0 env vars
  • Add VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
  • Fix EP flag condition (-gt 1 instead of -ge 1)
  • Add --kv-cache-dtype fp8
  • Add --max-cudagraph-capture-size 2048
  • Add --max-num-batched-tokens based on ISL
  • Add --stream-interval 20
  • Reduce --gpu-memory-utilization from 0.95 to 0.90

Validated Run

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23694341288

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

@kedarpotdar-nv kedarpotdar-nv requested a review from a team March 26, 2026 01:19
@functionstackx
Copy link
Copy Markdown
Contributor

@claude solve the merge conflicts

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Mar 28, 2026

Claude finished @functionstackx's task in 2m 0s —— View job


Merge Conflicts Resolved

Pushed commit f12f56d to nv/minimax-vllm018.
|

Resolve merge conflicts in 3 files:
- .github/configs/nvidia-master.yaml: keep expanded search space from PR
- benchmarks/single_node/minimaxm2.5_fp8_b200.sh: keep --kv-cache-dtype fp8 and --stream-interval 20 with --no-enable-prefix-caching from main
- perf-changelog.yaml: keep both main's new entries and PR's entry

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @faradawn @kedarpotdar-nv can u update & merge ur existing recipe PR first before we merge this inferencex PR? vllm-project/recipes#272

usually we allow the inferencex PR to merge first as the recipe PR usually merges as an fast follow, but the reason why the recipes PR needs to merge first is cuz the previous PR's recipe update PR still didnt merge..

@functionstackx
Copy link
Copy Markdown
Contributor

@claude remove isl 1024 osl: 8192

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Mar 29, 2026

Claude finished @functionstackx's task in 57s —— View job


Remove ISL 1024 OSL 8192 Configuration

  • Identify where ISL 1024 / OSL 8192 is configured
  • Remove the seq-len config block from nvidia-master.yaml (lines 3101-3105)
  • Update perf-changelog.yaml with removal note
  • Commit and push changes

Removed the ISL 1024 / OSL 8192 seq-len config (with tp:2 and tp:4 search-space entries) from minimaxm2.5-fp8-b200-vllm in nvidia-master.yaml. The 1k1k and 8k1k configs remain.

Pushed commit 83706ae to nv/minimax-vllm018.

@functionstackx
Copy link
Copy Markdown
Contributor

perf didnt change much

image

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
@functionstackx
Copy link
Copy Markdown
Contributor

@functionstackx
Copy link
Copy Markdown
Contributor

@kedarpotdar-nv can we not use an nightly image here? Minimax M2 arch been around for multiple months now. And from looking at the master.yaml files, minimax is all using release image

@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator Author

@Klaud-Cold please remove export VLLM_USE_DEEP_GEMM=0 from minimax b200 fp8 vllm script.

@functionstackx
Copy link
Copy Markdown
Contributor

@kedarpotdar-nv it is @ claude

@kedarpotdar-nv kedarpotdar-nv changed the title [WIP] B200 Minimax FP8 vllm upgrade B200 Minimax FP8 vllm upgrade Apr 3, 2026
@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator Author

@kedarpotdar-nv can we not use an nightly image here? Minimax M2 arch been around for multiple months now. And from looking at the master.yaml files, minimax is all using release image

@functionstackx this PR is ready for review. we are using release 0.19 continaer now

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Apr 3, 2026

@claude add pr description in accordance with perf changelog

- { tp: 2, ep: 2, conc-start: 4, conc-end: 256 }
- { tp: 4, conc-start: 4, conc-end: 512 }
- { tp: 4, ep: 4, conc-start: 16, conc-end: 64 }
# - { tp: 8, conc-start: 4, conc-end: 8 }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment @claude

Copy link
Copy Markdown
Collaborator

@Klaud-Cold Klaud-Cold Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude finished @cquil11's task in 40s —— View job


Remove Comment

  • Remove commented-out line at nvidia-master.yaml:3119 (# - { tp: 8, conc-start: 4, conc-end: 8 })
  • Commit and push

Removed the commented-out tp:8 search-space entry. Pushed commit 4f15004 to nv/minimax-vllm018.

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 3, 2026

Claude finished @cquil11's task in 39s —— View job


Add PR Description

  • Read perf-changelog.yaml for this PR's entries
  • Read changed files for context
  • Update PR description

PR description has been updated with a summary of all changes from the perf-changelog and diffs: image upgrade (v0.17.0 → v0.19.0), search space expansions, new EP entries, removed seq-len config, and all benchmark script parameter changes.

export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl

if [ "$EP_SIZE" -ge 1 ]; then
if [ "$EP_SIZE" -gt 1 ]; then
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch lol

- config-keys:
- minimaxm2.5-fp8-b200-vllm
description:
- "Update vLLM image from v0.17.0 to v0.18.0 for MiniMax-M2.5 FP8 B200"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to 0.19.0 @claude
also change pr desc to reflect this

Copy link
Copy Markdown
Collaborator

@Klaud-Cold Klaud-Cold Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude finished @cquil11's task in 52s —— View job


Update perf-changelog to 0.19.0 & PR Description

  • Read perf-changelog.yaml to find the entry to update
  • Update version from v0.18.0v0.19.0 in perf-changelog.yaml
  • Update PR description to reflect v0.19.0
  • Commit and push

Updated the perf-changelog entry (v0.18.0 → v0.19.0) and the PR description. Pushed commit 0d734c1 to nv/minimax-vllm018.

github-actions bot and others added 2 commits April 3, 2026 16:27
Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Apr 3, 2026

Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cquil11 cquil11 merged commit bddbf40 into main Apr 3, 2026
16 checks passed
@cquil11 cquil11 deleted the nv/minimax-vllm018 branch April 3, 2026 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

4 participants